Okay, so in this video nugget, we're going to do something very similar to the last one
where we were talking about searching and planning without observations.
Now we add observations to the mix. So the idea is that basically while we were doing
conformant planning, we're now doing the same thing in online planning. Basically,
we're adding percepts to the mix. And the most important thing here is that the transitioning
of the belief state really follows a two-step approach here. One is if we have a non-deterministic
or partially observable world, then we are in a situation where basically we have already
a belief state on the left here. And something is wrong here. Let me just do this again. There we
have the pointer again. So we have a belief state here on the left. Then we have an action,
which may be non-deterministic. So from this one state, we might actually get two successor states.
So we get a new belief state after the action. But then we have a percept. And the percept will
actually tell us, give us more information. It divides the, or partitions the belief states into
other belief states. And the percept tells us which of the ones it is. So we have to basically
to do planning and searching with observations, we have to rethink our transition model,
which is what we're going to do here. So if we want to have partial observability,
and I apologize for a lot of math here. So if we have a physical problem, the usual thing,
we have a set of states of actions, a transition model and initial and goals,
then we have the beliefs that state search problem is given by again, the power set of the states.
So the set of all subsets of the states, the actions, the lifted transition model,
the initial state, which is any state, and then the subsets of the goal states.
And the transition model, we construct in three stages that basically correspond to this idea.
Okay, so we have the prediction state, where we basically have some kind of a prediction function,
which we consider as being given with a problem, which is really something that takes a belief
state and an action and gives us a belief state. Then we have the observation prediction state,
which basically sees what the possible percepts are that could be observed in predicted belief states.
So remember that we had percepts with preconditions, and we remember the open, that you could only see
the color of the paint in a can if the can is open. So we have to see what the percepts in this
updated, action updated belief state could be. And then we use the update state, which for each
possible percept looks at the resulting belief state. Okay, so we have the prediction state,
which is essentially a belief state. Okay, so that gives us a result function that basically says,
what do we do in a belief state B? If we do action A, then this is essentially how we can predict
the belief state to be, which is essentially we predict the outcome of the action, we determine
the possible percepts to be the percepts that actually meet the preconditions. And then we
update with respect to those, that's actually the update stage is actually this set of this
update of the belief state, which gives us more information. One of the things you can actually,
if you look at this, you can see that the update is always a subset of the belief state with
respect to an observation O is always a subset. So this picture here is actually correctly drawn.
And if we have sensing to be, if we have sensing, which is deterministic, then actually the
possible percepts are disjoint, which means we have a partitioning of the original
predictive belief state. That's the math of it. And this function pred, which is really the action
prediction model and PERC, which is the sensor model is actually the main parameters in this model.
And those we're going to see from time to time. So let's look at this in a very easy case where
we have the vacuum cleaner. So the kind of two step thing here is we have a belief state that says,
we believe the robot to be on the left and we don't know anything about the dirt except that
the left hand room is dirty. By going right, we end up in this belief space two, four, and then
the sensing actually partitions this into two and four. For the slippery vacuum world, things are a
little bit more difficult. So we have the same initial state, one, three, then we go right, which
actually in the physical world actually gives us a non-deterministic action. So we end up with four
states in the intermediate, in the updated. And now we sense, and we can sense that B is dirty,
which gives us a singleton. We have A is dirty, which actually gives us a two state belief space,
Presenters
Zugänglich über
Offener Zugang
Dauer
00:23:49 Min
Aufnahmedatum
2021-01-31
Hochgeladen am
2021-01-31 19:29:00
Sprache
en-US